EDS 240: Assignment #3

Author

Karol Paya

Published

February 15, 2025

1.Which option do you plan to pursue? It’s okay if this has changed since HW #1.

Infographics

2. Restate your question(s). Has this changed at all since HW #1? If yes, how so?

How Did the 2022 Drought Impact Water the City of Coalinga?

Yes, the focus of my analysis has slightly changed. In the previous homework, I was getting familiar with the datasets, I wanted to explore a broader perspective on drought impacts across California.I am now narrowing my focus to the City of Coalinga, which experienced severe drought conditions in 2022. This shift will allow for a more detailed and localized analysis, making it a stronger story for the infographic.

3. Explain which variables from your data set(s) you will use to answer your question(s), and how.

California Department of Water Resources - Urban Water Data (Drought Planning and Management)

Source: https://data.cnra.ca.gov/dataset/urban-water-data-drought

Variable 1: Shortage Level This variable indicates the severity of drought in each water district in California, ranging from level 0 (no drought risk) to level 6 (high risk of drought). This data will allow us to assess the drought severity experienced by Coalinga in comparison to other areas across the state.

Variable 2: Historical Water Delivered This dataset provides information on the distribution of water supplied to various sectors (e.g., agriculture, residential, industry, etc.). By analyzing this data, we can identify which sectors were most affected by the drought, helping to understand the impact of water shortages on different areas of the economy.

Variable 3: Historical Water Produced This column shows the sources of water and their corresponding production levels (e.g., groundwater: 1,000 acre-feet, surface water: 500 acre-feet). It also includes data on how much water was purchased to meet demand. This variable will provide insight into how much of the water supply was sourced locally versus purchased during drought periods.

Nasdaq Veles California Water Index (NQH2O) Historical Data

Source: https://www.nasdaq.com/market-activity/index/nqh2o/historical?page=1&rows_per_page=10&timeline=y10

Variable 4: Water Price This dataset tracks the price of water over time. By analyzing price fluctuations, particularly during drought events, we can estimate the financial impact of the drought on water purchases. We will be able to visualize how water prices rise during periods of scarcity and calculate the cost of meeting water demand under such conditions.

4. In HW #2, you created some exploratory data viz to better understand your data. You may already have some ideas of how you plan to formally visualize your data, but it’s incredibly helpful to look at visualizations by other creators for inspiration. Find at least two data visualizations that you could (potentially) borrow / adapt pieces from. Link to them or download and embed them into your .qmd file, and explain which elements you might borrow (e.g. the graphic form, legend design, layout, etc.).

I was inspired by a map combined with a bar graph that effectively compares shortage levels across different cities in California. I plan to borrow the graphic form, where a map shows regional drought severity, and the bar graph highlights comparisons of shortage levels.

Alt text here

Initially, I had intended to use a blue-and-white color scheme to represent water. However, after seeing visualizations that utilize an orange-and-brown palette, I believe this color scheme better represents the urgency and severity of a drought crisis, which aligns with the overall tone of the infographic.

Alt text here

I really like the idea of using the size of water drops to represent percentages. This method is visually engaging and effectively conveys the data in a way that makes it easy to understand at a glance.

Alt text here

5. Hand-draw your anticipated visualizations, then take a photo of your drawing(s) and embed it in your rendered .qmd file – note that these are not exploratory visualizations, but rather your plan for your final visualizations that you will eventually polish and submit with HW #4. You should have: a sketch of your infographic (which should include at least three component visualizations) if you are pursuing option 1

Alt text here

Import libraries
Click to view code
# Load libraries
library(here)
library(ggplot2)
library(dplyr)
library(janitor)
library(ggridges)
library(tidycensus)
library(tidyverse)
library(lubridate)
library(gt)

Load Data

Click to view code
# Read data
waterprice <- read_csv(here('data/HistoricalData_waterprice.csv'))

# Drop rows NaN values
waterprice <- drop_na(waterprice, price)
waterprice <- drop_na(waterprice, Date)

# Convert  Date column to Date type
waterprice$Date <- as.Date(waterprice$Date, format = "%m/%d/%Y")

historical <- read_csv(here('data/historical_production_delivery.csv'))

Shortage Level Bar Graph

Click to view code
# Create shortage level dataframe
city_data <- data.frame(
  City = c("Sacramento", "San Francisco", "City of Coalinga", "Bakersfield", "Los Angeles", "San Diego"),
  Shortage_Level = c(2, 2, 5, 2, 3, 2)
)

# Make city column a factor 
city_data$City <- factor(city_data$City, levels = city_data$City)

# Custom colors
custom_colors <- c("#ffc800","#ffc800", "firebrick", "#ffc800","#d0630e", "#ffc800")

# Create a horizontal bar plot 
bar_graph<-ggplot(city_data, aes(x = Shortage_Level, y = City, fill = City)) +
  geom_bar(stat = "identity") +  # Create bars
  geom_text(aes(label = Shortage_Level),  # Add text labels
            hjust = -0.1,  # Position the labels to the right of the bars
            size = 3.5) +  # Adjust the size of the labels
  scale_fill_manual(values = custom_colors) +  # Apply custom colors
  theme_minimal() +
  labs(x = "Shortage Level") +
  theme(
    axis.text.x = element_blank(),  # Remove x-axis labels
    axis.ticks.x = element_blank(),  # Remove x-axis ticks
    legend.position = "none",  # Remove legend
    panel.grid = element_blank()  # Remove gridlines
  )

# Note: I will place this plot beside the California map, the y-label will be cropped
bar_graph

Note: I will place this plot beside the California map, the y-label will be cropped

Historical Water Produced/Delivered Percentages

Data Wrangle
Click to view code
# add new column `year`
historical$year <- year(ymd(historical$start_date))

district_historical<-'coalinga-city'

# Water Source
historical_watersource <- historical %>%
    filter(water_system_name == district_historical, water_produced_or_delivered == 'water produced', year==2022)%>%
  group_by(water_type) %>%
  summarise(total_quantity = sum(quantity_acre_feet, na.rm = TRUE)) %>%
  mutate(total_yearly_quantity = sum(total_quantity, na.rm = TRUE),  # Calculate total quantity
         percent = (total_quantity / total_yearly_quantity) * 100) # Calculate %

# Where the water was supplied
historical_watersent <- historical %>%
    filter(water_system_name == district_historical, water_produced_or_delivered == 'water delivered', year==2022 )%>%
  group_by(water_type) %>%
  summarise(total_quantity = sum(quantity_acre_feet, na.rm = TRUE)) %>%
  mutate(total_yearly_quantity = sum(total_quantity, na.rm = TRUE),  # Calculate total quantity
         percent = (total_quantity / total_yearly_quantity) * 100) # Calculate %

Note: The percentages will be represented as a water drop (not in a table)

Tables
Click to view code
# Table 1: Water Source (water produced)
historical_watersource_table <- historical_watersource %>%
  gt() %>%
  tab_header(
    title = "Water Source - Water Produced"
  ) %>%
  cols_label(
    water_type = "Water Type",
    total_quantity = "Total Quantity (Acre-feet)",
    percent = "Percentage of Total"
  )  %>%
  fmt_number(
    columns = c("total_quantity", "percent"),
    decimals = 2
  )

# Table 2: Water Sent (water delivered)
historical_watersent_table <- historical_watersent %>%
  gt() %>%
  tab_header(
    title = "Water Sent - Water Delivered"
  ) %>%
  cols_label(
    water_type = "Water Type",
    total_quantity = "Total Quantity (Acre-feet)",
    percent = "Percentage of Total"
  ) %>%
  fmt_number(
    columns = c("total_quantity", "percent"),
    decimals = 2
  )
# Display table1
historical_watersource_table
Water Source - Water Produced
Water Type Total Quantity (Acre-feet) total_yearly_quantity Percentage of Total
groundwater wells 0.00 3721.701 0.00
non-potable (total excluded recycled) 0.00 3721.701 0.00
non-potable water sold to another pws 0.00 3721.701 0.00
purchased or received from another pws 0.00 3721.701 0.00
recycled 0.00 3721.701 0.00
sold to another pws 0.00 3721.701 0.00
surface water 3,721.70 3721.701 100.00
# Display table2
historical_watersent_table
Water Sent - Water Delivered
Water Type Total Quantity (Acre-feet) total_yearly_quantity Percentage of Total
agriculture 0.00 2901.173 0.00
commercial/institutional 1,108.85 2901.173 38.22
industrial 176.83 2901.173 6.10
landscape irrigation 165.14 2901.173 5.69
multi-family residential 234.83 2901.173 8.09
other 0.00 2901.173 0.00
other pws 0.00 2901.173 0.00
single-family residential 1,215.52 2901.173 41.90

Historical Water Price Plot

Click to view code
# Create the time series plot with a vertical line for 2022 and a drought annotation

timeseries<-ggplot(waterprice, aes(x = Date, y = price)) +
  geom_line(color = "firebrick") +  
  geom_area(fill = "orange", alpha = 0.7) +  # Fill the area under the line with orange color
  labs(
    title = "Nasdaq Veles California Water Index (NQH2O)",
    x = "Date",
    y = "Price ($)",
    caption = "Source: Nasdaq") +
  theme_minimal() +
  geom_vline(xintercept = as.Date("2022-09-15"), linetype = "dashed", color = "red", size = 0.7) +  # Add vertical line at 2022
  annotate("text", x = as.Date("2022-01-01"), y = max(waterprice$price), label = "Drought Event", vjust = -0.5, hjust = -0.9, color = "red", size = 4)+
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),  # Center and bold the title
    plot.background = element_rect(fill = "white", color = "black", size = 1)
  )
timeseries

7. Answer the following questions:

a. What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R? If you struggled with mocking up any of your three visualizations (from #6, above), describe those challenges here.

The most challenging part for me has been crafting an interesting story with the available data. I find it difficult to make the visualizations engaging and meaningful with the limited information I have. I feel my graphs are somewhat basic. I’ll continue working on refining these plots to improve them.

b. What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?

I haven’t used any new packages that weren’t already covered in class.

c. What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?

Perhaps more tips on how to enhance the visuals and impact of my graphs, whether there are additional information or design elements that could make my infographic more engaging and informative.